Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typecast strings to integers for SQL against INTEGER columns #342

Closed
wants to merge 2 commits into from
Closed

Typecast strings to integers for SQL against INTEGER columns #342

wants to merge 2 commits into from

Conversation

parhamr
Copy link

@parhamr parhamr commented Aug 21, 2013

Changes

  • Typecasts getParam values for SQL INT columns with PHP (int)
  • Maps PHP array values with intval when querying SQL INT columns
  • Ensures optimal MySQL performance, as the query planner (seen in version 5.6.11) can perform full table scans when given string values for integer columns

Tests

  • Code review, please!
  • No functionality or logical changes (0 and NULL both evaluate false)

…dd intval typecasting to values of arrays for INTEGER primary and foreign keys.
@parhamr
Copy link
Author

parhamr commented Oct 16, 2013

Here is an example of exactly how problematic quoted INT columns are (MySQL 5.6.11)…

mysql> explain DELETE FROM `catalog_product_index_price` WHERE entity_id IN('433284', 433283)\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: catalog_product_index_price
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 773625
        Extra: Using where
1 row in set (0.00 sec)

mysql> explain DELETE FROM `catalog_product_index_price` WHERE entity_id IN(433284, 433283)\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: catalog_product_index_price
         type: range
possible_keys: PRIMARY,IDX_CATALOGPRODUCTINDEXPRICE_ENTITYID
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 19
        Extra: Using where
1 row in set (0.00 sec)

Removing the quotes from the INT column allows MySQL to use the index and this reduces the rows scanned by 99.997 percent.

@aedmonds
Copy link

👍

@parhamr
Copy link
Author

parhamr commented Oct 16, 2013

BTW, general internet consensus is that MySQL’s behavior is intentional and not a bug, so the fix must occur in Magento’s internals and query constructors.

Ref:

@banks
Copy link

banks commented Oct 17, 2013

@parhamr someone brought this diff to my attention but I'm confused about your last comment.

Neither of the references provided suggest this is expected behaviour at all.

The percona one describes the reverse case - a varchar column compared to an int value and even explicitly says

Interesing enough it works other way around – you can refer to integer column as a string in most cases and MySQL will use the index, as for any string there is only one number which matches it.

The first link you gave is a totally different issue - the OP there had an INT field (32bit) and was trying to read with id='10000000000' which is too big for a 32 bit int causing an implicit cast not to work. Also no one there claimed it was expected behaviour from MySQL that cast would not work in general.

I have no specific link to magento, and no reason to suggest this code change is bad or unnecessary, I'm just interested in this because it seems a pretty major bug or breaking change in MySQL that will affect millions of users if it is infact a real change in behaviour.

Further, I can't see anything in MySQL changelog for 5.6.11 or any reference other than this on the web to the issue which seems highly unlikely if it's real...

Also I can't reproduce trivially in SQLFiddle: http://sqlfiddle.com/#!9/b4550/1. I can't try DELETE there so perhaps it is only affecting deletes or some other factor like schema, index type or number of rows. Can you reproduce this reliably with a simple test case?

Did you have any other reason than those two links to think this is an intentional change in MySQL?

Thanks

[edit]

My mistake, I realised that the trigger for it appears to be mixing types in a single IN clause rather than string comparison per se: see http://sqlfiddle.com/#!2/b4550/6. Incidentally IN('49', '50') works just fine, it only appears to be mixing string and in that causes the issue.

That is same with MySQL 5.5.x as well.

@banks
Copy link

banks commented Oct 17, 2013

Relevant: http://dev.mysql.com/doc/refman/5.6/en/comparison-operators.html#function_in

You should never mix quoted and unquoted values in an IN list because the comparison rules for quoted values (such as strings) and unquoted values (such as numbers) differ. Mixing types may therefore lead to inconsistent results.

So it's a valid thing to ensure your code is consistent but all strings is not an issue it seems.

@parhamr
Copy link
Author

parhamr commented Oct 17, 2013

@banks thanks for the feedback! I’ve updated the gist to reflect quoting both INT values produces the correct result: https://gist.github.com/parhamr/7011685

I have other examples of queries with quoted INT values that resulted in table scans instead of index hits. I’ll go find them and see what was different. (I suspect it’s in regard to joins.)

@@ -283,7 +283,7 @@ public function addValueSortToCollection($collection, $dir = 'asc')
->joinLeft(
array($tableName => $attributeTable),
"e.entity_id={$tableName}.entity_id"
. " AND {$tableName}.attribute_id='{$attributeId}'"
. " AND {$tableName}.attribute_id={$attributeId}"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banks here’s the example of quoted INT columns in joins. I’ll go ask MySQL to explain this query…

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Not sure I understand that case. For example in http://sqlfiddle.com/#!2/04659/3 the quotes have no affect - you hit full table scan either way which is totally expected because left join does not restrict number or rows to return so if it returns all rows in left table then it certainly has to scan them all and no benefit comes from using an index anyway. If I add the restraint to WHERE as well then both queries correctly use index regardless of quoting. If I make it a straight join, both versions use index correctly.

If you found the quotes made a difference in this query, could you post the full query or modify my fiddle to show it please? I'm now just interested in understanding this in general :).

@markshust
Copy link
Contributor

Changeset looks good to me

@SchumacherFM
Copy link
Member

Looks really good. I hope you do have found all occurrences. Maybe even prices should be casts to (float).

But seeing array_map makes me cry: so slow. There is a better way https://gist.github.com/SchumacherFM/7035269

I hope Magento does accept your change and does not turn it down like mine: #201 (comment)

@parhamr
Copy link
Author

parhamr commented Oct 18, 2013

@SchumacherFM I’ve done pretty thorough checks, but I wouldn’t be surprised if more examples remain.

@verklov
Copy link
Contributor

verklov commented Nov 7, 2013

Hello parhamr,
Sorry for the delay with the response.
Thank you for your contribution! Our team will review your pull request and respond as soon as our analysis is complete.

@verklov
Copy link
Contributor

verklov commented Dec 1, 2013

parhamr,
We have completed investigating the issue. Unfortunately, we cannot accept your pull request.
As was mentioned in pull-request discussion, MySQL performance issue appears only when using both types in one IN clause
select * from some_table where id in (1, '2');

In others cases mysql still uses index for int field, so the following queries examples will not increase performance:
select * from some_table where id in ('1', '2');
select * from some_table where id = '1';
select * from some_table where id > '1';

Anyway it is better to solve the problem of typecast not in the client code, because it doesn't protect new code, but in ORM, which might be implemented if aligns with the product development roadmap.

@verklov verklov closed this Dec 1, 2013
@parhamr
Copy link
Author

parhamr commented Apr 2, 2014

ping @tim-bezhashvyly @astorm

@tim-bezhashvyly
Copy link

Hi @parhamr,

mixing types inside of IN() clause is a huge problem of cause but putting array_map in every possible place is definitely an overhead.

Regards,
Tim

magento-team pushed a commit that referenced this pull request Jun 12, 2015
magento-team pushed a commit that referenced this pull request Jan 28, 2016
magento-engcom-team added a commit that referenced this pull request Mar 6, 2019
 - Merge Pull Request magento/graphql-ce#342 from magento/graphql-ce:base-shipping-methods-checkout
 - Merged commits:
   1. 2cfeec7
   2. 15c9377
   3. c88e975
   4. b1a0bb6
   5. b4c032b
   6. d10e506
   7. 65ff06b
   8. a8c5504
   9. f3c9247
   10. 0ef7711
   11. e02dc94
   12. f2f05df
   13. b424d79
   14. 8b2106a
   15. a13639c
   16. bfacd22
   17. 031dc63
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants